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Abstract 

A density ratio is defined by the ratio of two probability densi- 
ties. We study the inference problem of density ratios and apply a 
semi-parametric density-ratio estimator to the two-sample homogene- 
ity test. In the proposed test procedure, the /-divergence between two 
probability densities is estimated using a density-ratio estimator. The 
/-divergence estimator is then exploited for the two-sample homogene- 
ity test. We derive the optimal estimator of /-divergence in the sense 
of the asymptotic variance, and then investigate the relation between 
the proposed test procedure and the existing score test based on em- 
pirical likelihood estimator. Through numerical studies, we illustrate 
the adequacy of the asymptotic theory for finite-sample inference. 

1 Introduction 

In this paper, we study the two-sample homogeneity test under semipara- 
metric density-ratio models. The estimator of the density ratio is exploited 
to obtain a test statistic. For two probability densities, p n (x) and Pd(x), 
over a probability space X, the density ratio r(x) is defined as the ratio of 



these densities, that is 



r(x) 



p d (x) 



in which p n (p^) denotes the "numerator" ("denominator") of the density 
ratio. For statistical examples and motivations of the density ration model, 
see Qin [17], Cox and Ferry [5] and Kay and Little [10] and the references 
therein. Qin |17j has studied the inference of the density ratio under retro- 
spective sampling plans, and proved that the estimating function obtained 
from the prospective likelihood is optimal in a class of unbiased estimat- 
ing functions under the semiparametric density ratio models. As a similar 
approach, Cheng and Chu [I] have studied a semiparametric density ratio 
estimator based on logistic regression. 

The density ratio is closely related to the inference of divergences. The 
divergence is a discrepancy measure between pairs of multivariate probabil- 
ity densities, and the /-divergence [U [6] is a class of divergences based on 
the ratio of two probability densities. For a strictly convex function / sat- 
isfying /(l) = 0, the /-divergence between two probability densities Pd{ x ) 
and p n (x) is defined by 



Since / is strictly convex, the /-divergence is non-negative and takes zero 
if and only if p n = p d holds. Popular divergences such as Kullback-Leibler 
(KL) divergence [13] . Hellinger distance, and Pearson divergence are in- 
cluded in the /-divergence class. In statistics, machine learning, and in- 
formation theory, the /-divergence is often exploited as a metric between 
probability distributions, even though the divergence does not necessarily 
satisfy the definition of the metric. 

A central topic in this line of research is to estimate the divergence 
based on samples from each probability distribution. A typical approach is 
to exploit non-parametric estimators of the probability densities, and then 
estimate, say, KL-divergence based on the estimated probability densities 



In order to estimate the /-divergence between two probabilities, Keziou 
|llj has exploited the conjugate expression of the /-divergence. Based on 
the conjugate expression, Keziou and Leoni-Aubin [12], and Broniatowski 
and Keziou [3] have developed /-divergence estimators for semiparamet- 
ric density-ratio models. Keziou and Leoni-Aubin [12] have applied the 
/-divergence estimator to the one-sample test. Recently, Nguyen et al. [16] 
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has developed a kernel-based estimator of the /-divergence using a non- 
parametric density-ratio model. 

Once the divergence between two probability densities is estimated, the 
homogeneity test can be conducted. In the homogeneity test, the null hy- 
pothesis is represented as H : p n = p^ against the complementary alterna- 
tive Hi : p n ^ pd- If an estimate of Df(pd,p n ) is beyond some positive value, 
the null hypothesis is rejected and the alternative is accepted. Keziou [12] 
has studied the homogeneity test using /-divergence estimator for semipara- 
metric density-ratio models. On the other hand, Fokianos et al. [7J adopted 
a more direct approach. They have proposed the score test derived from the 
empirical likelihood estimator of density ratios. In our paper, we consider 
the optimality of /-divergence estimators, and investigate the relation be- 
tween the test statistic using the /-divergence estimator and the score test 
derived from the empirical likelihood estimator. 

The rest of this paper is organized as follows: In Section [2] we introduce 
unbiased estimators of density ratios for semiparametric density-ratio mod- 
els. We also define some notation which is used throughout this paper. In 
Section El we consider the asymptotics of an /-divergence estimator. The 
main results of this paper are presented in Section [5] and Section (5) We 
present the optimal estimator for the /-divergence, which is then exploit 
for two-sample homogeneity test. Broniatowski and Keziou [3] proposed the 
estimator exploiting the conjugate expression of the /-divergence, but they 
argued neither its optimality nor its efficiency. A main contribution of this 
paper is to present the optimal estimator of the /-divergence in the sense of 
asymptotic variance under the semiparametric density-ratio models. Then, 
we propose a test statistic based on the optimal /-divergence estimator, and 
investigate its power function. Numerical studies are provided in Section [6l 
illustrating the adequacy of our asymptotic theory for finite-sample infer- 
ence. Section [7J is devoted to concluding remarks. Some calculations are 
deferred to Appendix. 

2 Estimation of density ratio 

We introduce the method of estimating density ratios according to Qin |17] . 
Let p n (x) and Pd(x) be two probability densities on probability space X . 
Their density ratio is defined as 



r(x) 
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for x € X. Two sets of samples are independently generated from each 
probability: 



( n ) (n) (d) (d) 

X-y , . . . , X mn ^i.i.d. Pni x \ i ■ ■ ■ > x m d ^i.i.d. Pd- 

The model for the density ratio is defined by r(x; 9) with the parameter 
9 G G C K d . We assume that the true density ratio is represented as 

r[x) = — -— = r(x; 9 ) 
Pd{x) 

with some 9* 6 0. The model for the density ratio r(x;9) is regarded as a 
semiparametric model for probability densities. That is, even if r(x;9*) = 
Pn(x)/pd(x) is specified, there are yet infinite degrees of freedom for the 
probability densities p n and p<±- 

The moment matching estimator for the density ratio has been proposed 
by Qin [E]. Let rj(x; 9) e R d be a vector- valued function from ^xSto M. d , 
and the estimation function Q n is defined as 

QvW ■■= — EK^ d) ;%(4 d) ;0) - — E^f^)- 

Since Pn(^) = ®*)Pd{x) holds, the expectation of Q ri {9) over the observed 
samples vanishes at 9 = 9*. In addition, the estimation function Q n {9) 
converges to its expectation in the large sample limit. Thus, the estimator 
9 defined solution of the estimating equation 

Q v @) = 

has the statistical consistency under the mild assumption, see [T7] for details. 

The moment matching estimation of the density ratio contains a wide 
range of estimators. Several authors such as Nguyen et al. |16j . Keziou 
and Leoni-Aubin [12] . Sugiyama et al. [21] and Kanamori et al. [9] have 
proposed various density-ratio estimators. These estimators with a finite- 
dimensional model r(x;9) can all be represented as a moment matching 
estimator. These existing methods, however, are intended to be applied 
with kernel methods which have been developed in machine learning |2CHI23], 
As another approach to density ratio estimation, Kwik and Mielniczuk [14] . 
Jacoba and Oliveirab [8] , and Bensaid and Fabre [2] have exploited the kernel 
density estimator, and studied convergence properties of estimators under 
several theoretical assumptions. 
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Before we present the asymptotic results, we prepare some notation. Let 
Nd(fj,,Ti) be the d-dimensional normal distribution with the mean vector p 
and the variance-covariance matrix E. The dimension d may be dropped 
if there is no confusion. E n [-] and V n [-] denote the expectation and the 
variance (or the variance-covariance matrix for multi-dimensional random 
variables) under the probability p n , and Ed[-] and Vd[-] are defined in the 
same way for the probability p d . The expectation and the variance under all 
samples, x[ (i = 1, . . . , m n ), Xj (j = 1, . . . , m^) are denoted as E[ • ] and 
V[-], respectively. The covariance matrix between two random variables 
under all samples are also denoted as Cov[-,-]. The first and the second 
derivative of the function / : R — > M are denotes as /' and /", respectively. 
Let di be the partial differential operator with respect to the parameter 9, 
that is di = -J^-. The gradient column vector of the function g with respect 
to the parameter 9 is denoted as Vg, i.e., Vg = (dig, . . . , ddg) T ■ For a 
vector-valued function r/(x; 6) = (rji(x; 9), . . . , rjd(x; 9)), let C[r](x; 9)] be the 
linear space 



^ k=l 



d 

a k r] k (x;9) I oi,... ,a d € 



In this paper, the linear space £[V log r(x; 9)] defined by 

f d 

C[Vlogr(x;9)} := < ^ a k d k log r(x; 9) \ ai,...,a d € 
^ k=i 

plays the central role. 

We introduce the asymptotics of density ratio estimation. Let p and m 

be 

m n (I 1 V 1 rn n m d 
p := , m := h 



\m n rridj m n + 

respectively, and the d by d matrix U v be 

U v = E n [ V (x;9)Vlogr(x;8) T ], 

where rj(x; 9) is a ci-dimensional vector-valued function. Suppose that U v is 
non-degenerate in the vicinity of 9 = 9* . Below, the notation p = m^/m^ is 
also used as the large sample limit of m n /m&, and we assume that < p < oo 
holds even in the limit. The asymptotic expansion of the estimating equation 
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Qr>(0) = around 9 = 9* yields the following convergence in law, 

V^(9- 9*) = -V^U-'Q, + op(l) A AT d (o, ^i ^N+V'W ^)-! 

(2) 

in which is set to 9*. The formula above is derived from the equalities 

,Ti _ pV d H+V n [r/] 



E[QJ =0, m ■ E[Q V Q; 



P + l 



Qin |17| has shown that the prospective likelihood minimizes the asymptotic 
variance in the class of moment matching estimators. More precisely, for the 
density ratio model 

r(x;9) = exp{a + (p(x;(3)}, 9 = (a, (3) G R x R"*- 1 , (3) 

the vector-valued function ry op t defined by 

rhpt(x\ 9) = r- * m V log r(x; 0) (4) 

minimizes the asymptotic variance of ([2]). 

3 Estimation of /-divergence 

We consider the estimation of /-divergence. As shown in ([I]), the /-divergence 
is represented as the expectation of the transformed density ratio f(r(x)), 
that is, 

Df(pd,Pn) = J Pd{x)f f~|~y) dx = J Pd( x )f( r ( x )) dx > 

for r(x) = p n (x)/pd(x). Once the density ratio is estimated by r(x;9), the 
/-divergence is also estimated by the empirical mean of f(r(x;9)) over the 
samples from p d . Here we consider an extended estimator. Suppose that 
the convex function / is decomposed into two terms, 

f(r) = f d (r) + rf n (r). (5) 

Then the /-divergence is represented as 

J Pd(x)f(r(x))dx = J p d (x)f d (r(x))dx + J p n (x)f n (r(x))dx (6) 
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since r(x) = p n ( x )/Pd(x) holds. Note that the decomposition ([5]) includes 
the conjugate representation f(r) = —/*(/' (r)) + rf'(r) with the conjugate 
function /* |19| . Keziou has exploited the conjugate representation for 
the estimation of the /-divergence. The empirical variant of © provides an 
estimate of the /-divergence, 

-. "M m n 

^/ = -E Mr(4 d) ;0)) + — £ /n(r(4 n) ; (7) 

8=1 J=l 

where the parameter is estimated by the estimation function Q^. Using 
the estimator Df, we can conduct the homogeneity test with hypotheses 

H :p n =p d , Hf.pn^pa. (8) 

When the null hypothesis is true, the /-divergence Df(p d ,p n ) is equal to 
zero and otherwise Df(p d ,p n ) takes a positive real value. Thus, the null 
hypothesis will be rejected when Df > t holds, where t is a positive constant 
determined from the significance level of the test. 

We consider the statistical properties of the estimator Df. The estimator 
(jZJ) depends on two choices: one is the vector-valued function r\ for the 
estimation of the density ratio, and the other is the decomposition of /, i.e., 
/d and / n . For the decomposition /(r) = fd(r) + r/ n (r), let us define 



I -| "id 

/ := \Hh^ E [/dH^V) " E d [f d (r(x;9) 
V p + 1 */md f—' 

2=1 



+ J * -L ^[/„(r(xf;g)-E a [/ n (r(x;g))|], 

V p + 1 V™n ~^ 

and the ci-dimensional vector c £ R d be 

c := E n [{f'(r(x; 9)) - f n (r(x; 0))}V log r(x; 9)] . 

Then, the first order asymptotic expansion of Df with /(r) = /d(0 + r /n( r ) 
yields that 

V^(Df-D f ) = Pf-^ic T U- 1 Q ri + o p (l), (9) 

in which Df denotes Df(p d ,p n ) and the functions are evaluated at 9 = 9*. 
Based on the above formula, we derive the estimator attaining the minimum 
asymptotic variance of Q. 



"In 
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4 Optimal Estimator of /-divergence 

We consider the optimal estimator of the /-divergence in the sense of the 
asymptotic variance. Some assumptions to be imposed are shown below. 

Assumption 1. The density ratio model r(x;9) and the function f of the 
f -divergence satisfy the following conditions. 

(a) The model r(x;9) includes the constant function 1. 

(b) For any 9 G 9, 1 G £[Vlogr(x; 9)} holds. 

(c) /(l) = f (1) = 0. 

As shown in Remark Q] below, standard models of density ratios satisfy 
(a) and (b) of Assumption [TJ 

Remark 1. Let <t)(x) = (0i(a;), . . . , 0<f(a;)) T G M. d be a vector-valued func- 
tion defined on X such that (j)\{x) = 1. The exponential model r(x;9) = 
exp{9 T cf)(x)} , that is, the model ([3]) satisfies (a) and (b) in Assumption 
[IJ In the same way, we see that the linear model r(x;9) = 9 T (f)(x) also 
meets the conditions. Indeed, the linear space £[V log r(x; 9)] is spanned by 
{4>i/r, . . . ,(f>d/r} and the equality 9 T (f)/r(x;9) = 1 holds for all 9 G 0. 

We compare the asymptotic variance of two estimators for the /-divergence; 
one is the estimator Df derived from the moment matching estimator using 
r](x;9) and the decomposition /(r) = fd(r) + rf n (r), and the other is the 
estimator Df defined by the density ratio estimator f](x;9) and the decom- 
position /(r) = fd(r)+rf n (r). For each estimator, the asymptotic expansion 
of Df is given as 

V^(D f - D f ) = P/ - ^Rc T U- x Q r) + o p {\) 

and 

^M{Df ~ D f ) = P/ - yMcfU^Qfj + op(l), 
respectively, where P/ and c are defined by 

f f ■= \Hh^ E [/d(K4 d) ;0) - E d [/ d (rM))]] 

V p+l JmA 

+ \l i- E [h(r{xf;9) - E n [Ur(x; 9))]} , 

c := E n [{f'(r(x;9)) - f n (r(x; 9))}V log r(x;9)}, 
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and the functions are evaluated at 8 = 8*. In order to compare the variances 
of these estimators, we consider the following inequality, 

< Y[Df-D f ] = Y[Df]-V[Df}-2Cav[Df-Df,D f ]. 

Suppose that the covariance above vanishes for any Df. Then we have the 
inequality 

Y[D f ] < Y[D f ] 

This implies that the estimator Df is the asymptotically optimal estimator 
for the /-divergence. 

Under Assumption [IJ some calculation of the covariance yields the equal- 
ity 

m d Cov[D f - Df,D f ] 

= E n [{/ n (r) - / n (r) + c T U^fj - ^-^{/(r) - (r + P ' l ){h{r) + c T U^fj)}] , 

(10) 

in which r denotes the density ratio r{x) and the functions are evaluated at 
9 = 8*. We study the sufficient condition that the above covariance vanishes. 

Theorem 1. Under As sumption^ suppose that f^(r(x;8)), fn{ r ( x >6))> an d 
fj(x; 8) satisfy 

f(r(x; 8)) - (r(s; 8) + p^X / n (r(x; 8)) + cFU^x; 8)) G £[V logr(x; 8)] 

(11) 

for all 8 € 0. Then the estimator Df using f) and the decomposition f(r) = 
fd( r ) + r /n( r ) uniformly attains the minimum asymptotic variance. 

Proof. For any p n and pd such that p n (x)/pd(x) = r(x; 8), we have 

E n [{f n (r(x; 8)) - f n (r(x; 8)) + c T U^ l f)(x- 8) - c^Uj^x; 8)}V logr(x; 8) T ] 
= E n [(/ n (r(x; 8)) - f n (r(x; 0)))V log r(x; 0) T ] + c T - c T 
= 0. 

Hence, when (|11|) holds, the covariance (|1U|) vanishes for any rj and any 
decomposition of /. □ 

Clearly, any optimal estimator of the /-divergence achieves the same 
asymptotic variance. In the following corollaries, we present some sufficient 
conditions of (jlll) . 
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Corollary 2. Under Assumption^ suppose that 

f(r(x; 9)) - (r(x; 9) + p' l )Ur{x; 9)) G £[V logr(x; 9)] (12) 

holds for all 9 G 0. Then, the function fj = rj opt defined in with the 
decomposition /d(r), fn(f) satisfies the condition (|lip. 

Proof. For r(x;9), r] op t(x;9), and / n , we have 



/(r(s; 0)) - (r(s; 0) + p~ l ){U(r{x- 9)) + c J U^rfafa 0)) 
= /(r(x; 0)) - (r(x; 9) + p- x )/ n (r(x; 9)) - p^cFU'^ V log r(x; 0). 

Under the condition (|12p . we see that the above expression is included in 
the linear space £[V log r(x; 6*)]. □ 



Based on Corollary [2] we see that the estimator defined from 

1 

- + pr 

(13) 



/d(r) = -^-, f n (r) = ^P-, and r?(x; 0) = ??opt (x; fl) = \ Vlogr(x;9) 
1 + pr 1 + pr 1 + /w(x; 0) 



leads to an optimal estimator of the /-divergence. 
We show another sufficient condition. 

Corollary 3. Under Assumption^ suppose that 

f(r(x;9)) - (r(x;9)+p- 1 )f\r(x;6)) G £[V log r(x; 9)}, 

and f'(r{x- 6)) - f n (r(x; 9)) G C[fj(x; 9)} 

hold for all 6 G 0. Then the decomposition f(r) = fd(f) + rf n (r) and the 
vector-valued function fj(x; 9) satisfy (jlip . 



Proof. When f'(r(x; 9)) — f n (r(x; 9)) G C[rj(x; 9)] holds, there exists a vector 
b G R d such that 

f'(r(x;9))-f n (r(x;9)) = b T f](x;9), 

and thus 

c T U^ =E n [(/'(r(x;0)) - / n (r(x;0)))Vlogr(x;0) T ]E n [7?Vlogr T ]- 1 = b T 

holds. Then we have c T U^' 1 fj(x;9) = b T fj(x;9) = f'(r(x;9)) — f n (r(x;9)). 
Hence 

f(r(x; 9)) - (r(x; 9) + p~ l ){f n {r{x; 9)) + c T U^fj(x; 9)) 
= f(r(x;9)) - (r(x;9) + p- 1 )f'(r(x;9)) G £[V log r(x; 9)} 
is satisfied, and thus (TTTT) holds. □ 
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We consider the conjugate representation f(r) = — /*(/'(?"*)) + r .f / ( r )> 
that is, / d (r) = -/*(/'(r)) and / n (r) = /'(r), where f*{r) = sup sgM {rs - 
/(s)}. Then, Corollary [3] implies that the decomposition based on the con- 
jugate representation leads to an optimal estimator when the model r(x\ 9) 
and the /-divergence satisfy 

f(r(x;9)) - (r(x;0) + p~ l )f {r{x;9)) G C[Vlogr(x; 9)]. (14) 

If (|14p does not hold, the optimality of the estimator based on the conjugate 
representation is not guaranteed. On the other hand, the decomposition (fT3|) 
leads to an optimal estimator without specific conditions on the model and 
the /-divergence. In addition, when f n (r) is defined as f n (r) = f'(r), the 
moment matching estimator using fj(x; 9) does not affect the asymptotic 
variance of the /-divergence estimator. Indeed, the equality f'(r(x;0)) — 
f n (r(x;9)) = holds and the vector c in Q vanishes. As a result, the 
variance of the estimator Df depends only on the decomposition of / up to 
the order O p (l). 

We show some examples for which Corollary [2] and Corollary [3] are ap- 
plicable to construct the optimal estimator. 

Example 1 (exponential density-ratio models and KL-divergence). Let the 
model be r(x;9) = exp{9 T 4>(x)} , 9 G R d with <f>(x) = (4>i(x), . . . , 4>d(x)) T 
and (pi (x) = 1. Then £[V log r(x; 9)] is spanned by 1, (f>2(%), ■ ■ ■ , <t>d( x ) an d 
clearly £[V logr(x; 9)] includes the constant 1. The f -divergence with f(r) = 
— logr + r — 1 leads to KL-divergence. Let /<j(r) = — logr — 1 and f n (r) = 1, 
then (I12p is satisfied, since 



f(r(x;9)) - (r(x;9) + p- 1 )f n (r(x;0)) = -9 T <f>(x) - 1 - p" 1 G C[Vlogr(x;9)] 

holds. Then, we see that the function n = 7] opt and the decomposition fd(r) = 
— log r — 1 and f n (r) = 1 lead to an optimal estimator of the KL-divergence. 
We see that there is redundancy for the decomposition of f . Lndeed, for any 
constants cq,c\ G R, the function cq + c\ log r(x; 9) is included in £[V log r] . 
Hence the decomposition 

r + a log r + c , . , . 

Jn{r) = ■ ; , /d(r) = r - log r - 1 - rf n (r) 

r + p 1 

with f\ = rj pt also leads to an optimal estimator. The decomposition in (|13p 
is realized by setting cq = —1, c\ = —1. 

Example 2 (power- model and power-divergence). Let the model ber(x;9) = 
(l + a9 T <p(x)) 1 ^ a with (f>i(x) = 1, where a is the parameter to specify the 
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divergence such that a > — 1. Then £[V logr(x; 9)] is the linear space 
spanned by cj)i(x)/r a , . . . , 4>d(x) /r a . We see that 1 = (ei + a9) T X7 log r(x; 9) 
holds, where e\ is the unit vector (1,0,... , 0) T G M. d . The convex function 
f(r) = r — 1 + (r~ a — l)/a leads to the power divergence J7J 

PdW/ (^W !(/?*&_! 

\Pd{x)J a\J Pn{X) a 

Hellinger distance is given by setting a = —1/2, and Pearson divergence is 
realized by setting a = 1. In the limit of a — > 0, KL-divergence is recovered. 
Letting / d (r) = — 1 + {r~ a — l)/a and f n (r) = I, we have 

f(r(x;9)) - (r(x;9)+p~ 1 )f n (r(x;9)) = — 1 — p~ l G C[V logr(x;9)} 

and thus, due to Corollary\^the decomposition fd{ r ) = — l+(r~ a — l)/ct, / n (?") = 
1 and the moment matching estimator using r\ = r/ opt lead to an optimal 
estimator of the power divergence under the power model. Also, the decom- 
position (|13p leads to another optimal estimator. 



Example 3 (exponential density-ratio model and mutual information). Let 

the model be r(x; 9) = exp{# T 4>{x)} , 9 G M d with <j>{x) = (4>i(x), . . . , (f>d(x)) T 

and (pi(x) = 1. Then, the linear space C[V log r(x; 9)] is spanned by {(j>i(x) <j>d(&)} 

and thus £[V log r(x; 9)] includes the function of the form Co + c\ log r(x; 9) 

for cq,c\ G R. Lei £/ie convex function f(r) be 

„/ x 1 1 + P P r(l + p) , 

f(r) = t— log — - + r -T— log -±— 15 
1 + p 1 + pr 1 + p 1 + pr 

/or p > 0. T/ien £/ie corresponding f -divergence is reduced to mutual infor- 
mation: 



I 



Pd{x)f dx = [ V p(g,y)log P f\' V ) dx, 

\Pd{x)J J *-f A p{x)p{y) 



y=n,d 

in which the joint probability is defined as 

p(x,n) =p n (x)— P — , p(x,d) =p d (x)——. 

1 + p 1 + p 

The equality p d = p n implies that the conditional probability p(x\y) is inde- 
pendent ofy. Thus, mutual information becomes zero if and only if pd = Pn 
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holds. For any moment matching estimator, the following decomposition 
satisfies the condition in Corollary^ 



/d(r) 



1 



1 + P 

Indeed, the equalities 



log 



1+P 

1 + pr ' 



Mr) 



P 



r(l+p) 
1 +p ~° 1 + pr 



log 



(16) 



f(r(x;6))-(r(x;0)+p- 1 )f'(r(x;e)) = 
f'(r(x;e))-f n (r(^0)) = 0eC[ V (x;9)] 



log(r(x; 6)) 
1 + P 



€ £[Vlogr(x;<9)], 



/io/d /or any r/(x;6). The estimator derived from the decomposition above 
with r\ = r/ opt has also been proposed by Keziou and Leoni-Aubin \12^ . 
In their work, the estimator is derived as the conjugate expression of the 
prospective likelihood. In this example, we present another characterization, 
that is, the optimal estimator for mutual information. 

Example 4 (linear model). Let r(x;9) = 1 + 9 T <j){x) and (f>i(x) = 1. The 
subspace £[V log r(x; 6)] is spanned by {4>i/r, . . . , 4>d/r}, and thus £[V log r{x\ 9)\ 
includes the function of the form cq + c\jr for cq,c\ £ R. Let the convex 
function f be 



i 



1 + (1 + pr) log 



1 + pr 
r{l + p) 



for p > 0. Then the corresponding f -divergence is expressed as 



Pd(x)f 



Pnjx) 
Pd(x) 



dx = KL 



Pd + PP n 
1 + P 



Pn 



where KL is the Kullback-Leibler divergence. The f -divergence vanishes if 
and only if p n = Pd holds. Using Corollary^ we see that the decomposition 



fd(r) 



P+l 



1 



1 + pr 



log 



1 + pr 
r(l + p) 



Mr) 



P+l 



1 + pr 



1 1 + pr 

+ log 



r(l+p) 



and the moment matching estimator using n = r/ opt lead to an optimal es- 
timator for the above f -divergence. On the other hand, due to Corollary^ 
we see that the decomposition 



Mr) 



log 



1 + pr 



1 + /T~° r(l + p)' 
leads to another optimal estimator. 



Mr) = f'(r) 



r(l + p) 



1 + pr log 



1 + pr 
r(l + p) 
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5 Homogeneity test exploiting /-divergence esti- 
mator 



For the homogeneity test of p n and we need to know the asymptotic 
distribution of Df under the null hypothesis of (jHJ). In this section, we 
assume 

— — = r{x = 1 

Pd(x) 

and 1 € C[rj(x;9*)]. Then we see that P/ = holds for any decomposition 
of /, and thus the asymptotic expansion of Df around 9 = 9* satisfies 

yM(Df-D f ) = o p (l), 

where Df = Df(pd,p n ) = 0. For p n = pd, the variance covariance matrix 
G°V n [7/] + V n [f7])/(p + 1) in ([5]) is degenerate. This is the reason why the 
probabilistic order of y/rn(Df — Df) becomes o p (l). On the other hand, for 
Pn 7^ Pd, y/md(Df — Df) is of the order O p (l). 

Below, we consider the optimal estimator Df defined from (|13p . The 
asymptotic distribution of the optimal estimator is given by the following 
theorem. 

Theorem 4. Let Assumption[l\hold, and we assume p n (x) /pd(x) = r(x; 9*) = 
1. Suppose that the ratio of the sample size, p = m n /md, converges to a 
positive value, and that the d by d symmetric matrix f/„ with r\ = rj op t is 
non- degenerate in the vicinity of 9 = 9*. Let Df be the estimator defined 
from (|13p . Then, in terms of the asymptotic distribution ofDf, we obtain 

2m - d 2 



f"(\) ~ f A.d-1) 
where xj is the chi-square distribution with I degrees of freedom. 

The proof is deferred to Appendix 1. For the homogeneity test of p n and 
Pd, the null hypothesis p n = p^ is rejected if 

£/>^xLiU-«) (17) 

is satisfied, where Xd_i(l — a ) is the chi-square 100(1 — a) percent point 
function with d — 1 degrees of freedom. The homogeneity test based on (|17j) 
with the optimal choice (1131) is referred to as Df -based test. 
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We consider the power function of the homogeneity test, and compare 
the proposed method to the other method. A standard approach for the 
homogeneity test is exploiting the asymptotic distribution of the empirical 
likelihood estimator 9. Under the model 



r{x-9) = exp{a + (p(x;l3)}, 9 = (a, (3) G R x R"*- 1 , (18) 

Fokianos et al. [7J pointed out that the asymptotic distribution of the em- 
pirical likelihood estimator 9 = (a, /3) G R x R rf_1 under the null hypothesis 
Pn = Pd is given as 

where 9* = (a*,/3*) and V p4> is the d — 1 dimensional gradient vector of 
4>(x; /3) at (3 = (3* with respect to the parameter f3. Then the null hypothesis 
is rejected if the test statistic 

s=m0- n T v B [v P m - n m 

is larger than Xd_i(l — Q )i where UnfV^^] is a consistent estimator of 
In this paper, the homogeneity test based on the statistic S is 
referred to as empirical likelihood test. Fokianos et al. [7] studied statisti- 
cal properties of empirical likelihood test through numerical experiments, 
and reported that the power of empirical likelihood test is comparable to 
standard £-test and F-test. 

Below, we show that the power of Z)/-based test is the same as empirical 
likelihood test under the setup of local alternative, where the distributions 
p n and pd vary according to the sample size. To compute the power function, 
we assume the following conditions. 



Assumption 2. Let the density ratio model r(x;9) be represented as (|18p . 
Let r(x; 9*) = 1 and 9 m = 9* + h m / y/m, where h m G R d and lim m _ i , 00 h m = 
h G M. d . Suppose pd{x) = p(x) for a fixed probability density p(x) and that 
the probability density p^ is represented as p^™\x) = pd(x)r( x] Oyji] . For 
each sample size m n and md, the samples x^\ . . . , Xm\ are generated from 

Pn^ , and Xj^ , . . . , Xm\ are generated from pd- The limit of the ratio m n /md 
is denoted as p. Let the matrix-valued function M{9) and U (9) be 

M(9) = j p(x)V log r(x;9)V log r(x;9) T dx, 

U{9) = [ p(x) 1 V log r(x;9)V log r(x;9) T dx, 
J l + pr(x;9) 
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and assume that these are continuous and non- degenerate in the vicinity 
of 9* . Let V[Vr] be the variance- covariance matrix of V log r(x; 9*) = 
Vr{x\9*) under p{x). We assume 

1 _ m " 

— VVr(4 n) ;0*)Vr(4 n) ;0*) T M(0*), (20) 



V^U(9 m )(fi-9 m ) ^ N(o,-^-^V[Vr}\ (21) 

where (f20|) implies the convergence in probability, that is, for any e > 0, the 
probability such that 



1 "'n 

Vr(xJ n) ; 9*)Vr(x { f; 9* ) T - M(9 

n j'=i 



under the samples from p^T 1 ' converges to zero when m n tends to infinity. 
The notation X m — ^ P in (|2ip denotes that the distribution function of 
X rn depending on p^ and pd converges to P in law, when m tends to 
infinity. See Section 14 in f22^ and Section 11.4-2 in fT5^ for details of 
the asymptotic theory under the local alternative. For h m = € M d , the 
condition on %/rnU(9 m )(9 — 9 m ) is reduced to ([2]) with rj = n opt . 

In the above, one can make the assumption weaker such that the prob- 
ability pd also varies according to the sample size. We adopt the simplified 
assumption above to avoid technical difficulties. 

Theorem 5. Under Assumption^ and Assumption^ the power function 
of Df -based test is asymptotically given as Pr{y > Xd-i(l ~~ a )}> where 
Y is the random variable whose distribution function is the non- central chi- 
square distribution with d—1 degrees of freedom and non-centrality parameter 
h T M(9*)h. Moreover, the asymptotic power function of empirical likelihood 
test is the same. 

The proof is given in Appendix 2. Theorem implies that, under the 
local alternative, the power function of .Dj-based test does not depend on 
choice of the /-divergence and that empirical likelihood test has the same 
power as Df -based test. 

Next, we consider the power function under the misspecification case. 
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Theorem 6. We assume that the density ratio pi /p<i *s n °t realized by 
the model r(x;9), and that p^ is represented as 

pM( x) = PA{x )(r(x;e m ) + ^ {x)+£r 



■m 



where s m (x) satisfies E[s m (x)] = under the probability Pd(x) = p(x), and 
assume lim m _> 

od £m — £• Suppose Assumption [7] and Assumption^ except 
the definition ofp n m \x). Then, under the setup of the local alternative, the 
power function of Df -based test is larger than or equal to that of empirical 
likelihood test. 

The proof is given in Appendix 3. Even in the misspecification case, the 
assumption (120p and (|2ip will be valid, since eventually the limit of pi /Pd 
is realized by the model r(x;8*) = 1. Theorem [5] and Theorem [6] indicate 
that D j-based test is more powerful than empirical likelihood test regardless 
of whether the model r(x;0) is correct or slightly misspecified. 



6 Numerical Studies 

In this section, we report numerical results for illustrating the adequacy of 
the asymptotic theory for finite-sample inference. 

We examine two /-divergences for the homogeneity test. One is KL- 
divergence defined by f(r) = r — 1 — log(r) as shown in Example [U and the 
test statistic is derived from (113p . This is referred to as KL-based test. The 
other is mutual information defined by (|15p , and the estimator Df is derived 
from the decomposition f)16|) and the moment matching estimator r] = rj pt. 
This is referred to as Mi-based test. These tests are compared to empirical 
likelihood test (|19p proposed by Fokianos et al. [7j and Hotelling T 2 -test. 
The null hypothesis of the testing is Hq : p n = Pd and the alternative is 
Hi ■ Pn 7^ Pd- The type-I error and the power function of these tests are 
computed. In all numerical studies, the sample x is 10-dimensional vector, 
and the semiparametric model for density ratio is defined as 

, 10 10 ^ 

r(x; 9) = exp j a + ftx; + ^ p 10+j x) \ (22) 
^ i=i j=i ' 

with the 21-dimensional parameter = (a, /3±, . . . , $20 )• 

First we assume that the null hypothesis p n = pd is correct, and we 
compute the type-I error. We consider three cases: in the first case, the dis- 
tributions of p n and pa are given as the 10-dimensional normal distribution 
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Table 1: Averaged Type-I errors over 300 runs are shown as functions of 
the number of samples. Normal distribution, i-distribution with 10 degrees 
of freedom, and ^distribution with 5 degrees of freedom are examined as p n 
and p d . Below, "MI", "KL" and "emp." denote Mi-based test, KL-based 
test and empirical likelihood test, respectively. 





10-d 


im Normal 


10-dim. 


t-dist. 


(df=10) 


10-dim. 


i-dist. 


(df=5) 


m n (= m d ) 


MI 


KL 


emp. 


MI 


KL 


emp. 


MI 


KL 


emp. 


100 


0.080 


0.117 


0.183 


0.133 


0.217 


0.297 


0.100 


0.210 


0.377 


500 


0.070 


0.083 


0.080 


0.070 


0.090 


0.107 


0.060 


0.107 


0.187 


1000 


0.053 


0.057 


0.060 


0.073 


0.070 


0.093 


0.070 


0.103 


0.170 


1200 


0.047 


0.050 


0.067 


0.073 


0.087 


0.097 


0.067 


0.093 


0.170 



iVio(0, Jio); in the second case, each element of x £ M 10 is independent and 
identically distributed from the t-distribution with 10 degrees of freedom; 
and in the third case, each element of x 6 M 10 is independent and identically 
distributed from the i-distribution with 5 degrees of freedom. The sample 
size is set to m Q = m d and varies from 100 to 1200, and the significance level 
is set to 0.05. The type-I errors are averaged over 300 runs. For each case, 
the averaged type-I errors of KL-based test, Mi-based test, and empirical 
likelihood test are shown in Table [TJ In the normal case, the type-I error 
of three tests converges to the significance level with modest sample size. 
In the case of t-distribution, the type-I error of empirical likelihood test is 
larger than the significance level even with large sample size. On the other 
hand the type-I error of Mi-based test is close to the significance level with 
moderate sample size even for the case of i-distribution. 

Next, we compute the power function of KL-based test, Mi-based test, 
empirical likelihood test, and Hotelling T 2 -test. In the numerical simula- 
tions, Pn(x) is fixed and p d {x) is varied by changing the mean parameter or 
the scale parameter. In the same way as the computation of the type-I error, 
p n (x) is fixed to one of the three probabilities: 10-dimensional normal distri- 
bution iVio(0,/io), 10-dimensional i-distortion with 10 or 5 degrees of free- 
dom. The probability p d (x) is defined by changing the mean or the variance 
of the probability p n (x). In the first setup, the sample = (x[ d \ . . . , x±q) 
from pd is computed such that 

x^=x e + ^, £ = 1, ...,10, x = {xi,... ,xi ) ~p Q , (23) 
that is, the mean parameter \x £ M. is added to each element of x. Hence, 
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Pn = Pd holds for fi = 0. In the second setup, the sample x^ = (x± , . . . , x\^ ) 
from ^ is computed such that 

xf^ = a x x e , £ = 1, ...,10, x = (xt, . . . , x 10 ) ~ p n , (24) 

that is, the scale parameter a > is multiplied to each element of x. Hence, 
Pn = Pd holds for a = 1. In all simulations, the sample size is set to 
m n = rrid = 500 or 1000 and the significance level is 0.05. When both 
p n and pd are the multi-dimensional normal distribution, the density ratio 
model (|22p includes the true ratio. For the t-distribution, however, the 
true ratio r(x) resides outside of the model (|22|) . The power functions are 
averaged over 300 runs. 

Table [2] shows the averaged power functions over 300 runs for the setup 
(|23p . The mean parameter [i varies from —0.1 to 0.1. When both p n and 
Pd are the normal distribution, the power functions of KL-based test, MI- 
based test, and empirical likelihood almost coincide with each other. The 
power of Hotelling T 2 -test is slightly larger than the others. This result is 
obvious, since Hotelling T 2 -test works well under the normal distribution. 
Under the t-distribution with 5 degree of freedom, the power of empirical 
likelihood test around fx = is much larger than the significance level. That 
is, empirical likelihood test is not conservative, and will lead false positive 
with high probability. In Mi-based test, the power around fi = is close 
to the significance level and the power is comparable to Hotelling T 2 -test 
outside of the vicinity of \i = 0. 

Table [3] shows the averaged power functions over 300 runs when the scale 
parameter a in (|24p varies from 0.9 to 1.1. In this case, the means of p n and 
Pd are the same, and hence Hotelling T 2 -test fails to detect the difference 
of p n and pd- In addition, we see that the power function of empirical 
likelihood test is biased, that is, the power function takes the minimum 
value at a less than 1. This is because the estimated variance, V n , based 
on empirical likelihood estimator tends to take slightly small values than 
the true variance. In Mi-based test, the power around a = 1 is close to the 
significance level, while the power of KL-based test is slightly larger than 
the significance level around a = 1. 

As shown above, when the model r(x;0) is correct, the power of KL- 
based test, Mi-based test, and empirical likelihood test is almost the same. 
Thus, the numerical simulations meet the theoretical results in Theorem 
[5j Empirical likelihood test has large type-I error and the power is slightly 
biased especially when the samples are generated from the t-distribution. 
Throughout the simulations, Mi-based test has the comparable power to 
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the other methods, while the type-I error is well controlled. In the simu- 
lations, we see that the null distribution of Mi-based test is approximated 
by the asymptotic distribution more accurately than that of KL-based test, 
although the first-order asymptotic theory provided in Section [5] does not 
explain the difference between Mi-based test and KL-based test. We ex- 
pect that higher order asymptotic theory is needed to better understand the 
difference among /-divergences for the homogeneity test. 

7 Conclusion 

We have addressed the inference problem of density ratios and its applica- 
tion to homogeneity test under the semiparametric models. We showed that 
the estimator introduced by Qin [T7] provides an optimal estimator of the 
/-divergence with appropriate decomposition of the function /, and pro- 
posed a test statistic for homogeneity test using the optimal /-divergence 
estimator. It is revealed that the power function of D/-based test does not 
depend on the choice of the /-divergence up to the first order under the lo- 
cal alternative setup. Additionally, D/-based test and empirical likelihood 
test [7] were shown to have asymptotically the same power. For misspeci- 
fied density-ratio models, we showed that Dj-based test usually has greater 
power than empirical likelihood test. In numerical studies, mutual informa- 
tion based test provided the most reliable results than the others, that is, 
the null distribution was well approximated by the asymptotic distribution 
with moderate samples size, and the power was comparable to Hotelling 
T 2 -test even under the normal case. 

The choice of the /-divergence is an important open problem for the 
homogeneity test. In our first-order asymptotic theory, the choice of the 
/-divergence does not affect the power function. Hence, higher order asymp- 
totic theory may be necessary to make clear the difference among /-divergences 
for the homogeneity test. 
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Appendix 1 
Proof of Theorem [4] 

Proof. Let 56 = 6-6*. Then, due to ©, we have 

^fmb6 = -VmU^Qrf + o p (l), 

where 77 = r? opt defined in (jU). Let fd(r) = /(r)/(l + pr) and / n (r) = 
p/(r)/(l + pr). Then we have / d (l) = / d '(l) = / n (l) = / n '(l) = and 
/d"(l) + /n"(l) = /"(I), since f(l) = /'(l) = is assumed. Hence, the 
asymptotic expansion of mDf around 6 = 6* leads to 

f "(1) 

mDf = J -±^y/m~56^V A [Vr(x;6*)Vr(x-6*) T ]y/m~56, 

+ ^ 1 ^y/m~Se r E n [Vr(x-, 6*)Vr(x; 6*) T }^56 + o p {\) 

= (1 + ^ 2//,(1) v / ^Q^(E n [Vr(x; 0* )Vr(x; 6*) T ]y 1 ^Q v + o p (l), 

since p n = and r(x; 0*) = 1 hold. The asymptotic distribution of sfmQ^ 
is the Gaussian distribution with mean zero and variance-covariance matrix 
V n [Vr]/(l + p) 2 , since the equality r] opt (x;6*) = Vlogr(x; 0*)/(l + p) = 
Vr(x; 6>*)/(l + p) holds. Let M be the d by d matrix defined as 

M = E n [Vr(x; 6*)X7r(x; 6*) T ], 

be a ti by d n 

ically 



and x/v" be a d by ci matrix such that = V n [Vr]. Then asymptot- 



1 D f A Zj^M^VvZa 



/"(I 

holds, where is the d-dimensional random vector whose distribution is the 
c?-dimensional standard Gaussian distribution, that is, Zj ~ Nd(0,ld)- Let 
\[~M be the symmetric positive definite matrix such that M = \/~M\fM^ 
and the vector p be p = E n [Vr(x; 6*)]. Note that \/M is well-defined, 
since M is a positive definite matrix. Let P be the d by d matrix P = 
- y/M' 1 pp T 'y/W , th< 
/ M p. Indeed, we have 

||\/M~V|| 2 = Ei X [Vr] T E I1 [VrVr T ]~ 1 E u [Vr] = E n [Vr] T & = 1, 



I — y/ M pp 1 \/M , then P is the projection matrix along the vector 
-l 
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where b € M. d is the vector such that V log r(x; Q*) T b = Vr(x;9*) T b = 1. 
We can choose vT^ = y/MP, since = M — fifi T holds. As a result, 

we have Zjy/V M~ 1 y/V Z& = ZdPZj, and the distribution of Z&PZ& is the 
chi-square distribution with d — 1 degrees of freedom. □ 

Appendix 2 
Proof of Theorem [5] 

First, we calculate the power function of Df-based test. 

Proof. Let E[] be the expectation under the probability Pd(x) = p(x). The 
equality p^ n \x) = p&{x)r{x; 9 m ) leads to E[Vr(x; 9*)] T h = 0. Indeed 

[ pW(x)dx = ! p d (x)r(x-e m )dx ==> 1 = 1 + E[Vr(x;9*)] T -^ + o(l/y/m) 
J J ' V m 

holds, and thus we have E[Vr(x; 9*)] T h = when m tends to infinity. Let 
M be M(9*) = E[Vr(x;9*)Vr(x;9*) T ], pt be E[Vr(x;9*)], and y/V be a 

matrix such that = V[Vr]. Let 89 m be 8 — 9 m . Under Assumption 

[T] and Assumption [2j the asymptotic expansion provides 

= (y/mS0m + h m ) T M(y/m59 m + h m ) + o p (l) 

= (yMU(6m)66m + U(9 m )h) T U(9 m y 1 MU(9 m )~ 1 (y/^U(9 m )59 m + U(9 m )h) + < 
||\/M" 1 VI 7 Z d + \/M/i|| 2 . 



In the same way as the proof of Theorem [H we see that \/M 1 \/V 7 is the 
projection matrix along the vector y/M fj,. Moreover, y/Mh is orthogonal 
to the vector y/M fx since fx T h = holds. As a result, we see that the dis- 
tribution function of y/V Z& + \/M7i|| 2 is the non-central chi-square 
distribution with d — 1 degrees of freedom and non-centrality parameter 
h T M{9*)h. □ 

Next, we calculate the power function of empirical likelihood test. The 
notations M and [i are the same as the proof above. 
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Proof. From the definition of the statistic S, we have 

S = m0-(3*) T V n [Vp4>}0 ~ P*) = m(6- 9*) T V(8- 9*) + o p (l), 
where V = V[Vr]. Then we have 
m(6- e*) T v(e- e*) + o p (l) 

= (V^u(8 m )5e m + uie^hfuiem^vuiemyW^uie^em + u(o m )h) + . 

^ \\VV T yfM~\^Wr l VvZ d + yfMh)\\ 2 . 

The matrix \/M is the projection matrix along the vector \[M 
and ix T h = holds. Then we see that the vector \[M y/V Z d + \/Mh is 
orthogonal to y/~M fi. This implies 

\\VV T \^~ 1 (VM~ 1 VvZ d + VMh)\\ 2 = ||v / M~ 1 v / yZ fi + v / M/i|| 2 . 

Thus, under the local alternative setup, the limit distribution of the test 
statistic S is the non-central chi-square distribution with the same parameter 
as -D/-based test. □ 

Appendix 3 
Proof of Theorem 2J] 

Below, the notations M = E[Vr(x; 8*)Vr(x; 9*)} and n = E[Vr(x;0*)] are 
used. 

Proof. From the definition of the density p^ a \x), we have 

'p^(x)dx = J p d (x) (r(x;9 m ) + s -^ g "^ dx 
1 = 1 + E[Vr(x; 6*)} T ^ + i== + o(l/V^), 



and thus, the equality fi T h + e = holds when m tends to infinity. Let the 
random vector W be 



W = PZ d + VMh, Z d ~ N d (0, I d ), 



where P is the projection matrix along the vector \l M /i as defined in the 
proof of TheoremHl According to the proof in Theorem[5]in Appendix 2. the 
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power of -D/-based test is asymptotically equal to Pr { || W\\ 2 > Xd-i(^~ a )} > 
and that of empirical likelihood test is equal to Pr { ||PW|| 2 > x|-i(l ~ a )}- 
We have the equality W = PW + cs/M l h with some cel. Note that 
generally \f~Mh is not orthogonal to \[~M /i in the misspecified case, since 

(Vm"V) T v / M/i = fi T h = -e 

holds. For e / 0, we have c / and then the inequality ||VF|| 2 > ||PVF|| 2 
holds. As a result, the power of Df -based test is larger than or equal to that 
of empirical likelihood test under the misspecified setup. □ 
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Table 2: Averaged power functions over 300 runs are shown as functions 
of the mean parameter of the probability Pd(x), where Pd(x) is defined by 
(|23p through the probability p n . Normal distribution, i-distribution with 
10 degrees of freedom, and ^distribution with 5 degrees of freedom are 
examined as p n . Below, "MI", "KL", "emp." and "Hote." denote MI- 
based test, KL-based test, empirical likelihood test and Hotelling T 2 -test, 
respectively. 

m n = nid = 500 



10-dim Normal 10-dim. i-dist. (df=10) 10-dim. f-dist. (df=5) 







MI 




KL 


emp. 


Hote. 




MI 




KL 


emp. 


Hote. 




MI 




KL 


emp. 


Hote. 


-0.1 





.894 





.902 


0.898 


0.964 


0. 


812 





826 


0.822 


0.886 





.680 





,724 


0.746 


0.750 


-0.08 





.650 





.662 


0.654 


0.778 


0. 


538 





572 


0.608 


0.714 





.472 





532 


0.592 


0.562 


-0.06 





.362 





.388 


0.384 


0.510 





302 





328 


0.360 


0.418 





.236 





,296 


0.408 


0.258 


-0.04 





.184 





.190 


0.214 


0.226 


0. 


132 





156 


0.200 


0.176 





.130 





186 


0.284 


0.134 


-0.02 





.084 





.100 


0.104 


0.074 





080 





104 


0.148 


0.080 





.082 





132 


0.216 


0.072 








.046 





.058 


0.062 


0.036 





.062 





082 


0.098 


0.046 





.054 





,090 


0.170 


0.056 


0.02 





.072 





.080 


0.092 


0.070 





064 





076 


0.104 


0.044 





.090 





,150 


0.218 


0.074 


0.04 





.196 





.206 


0.212 


0.210 





138 





160 


0.186 


0.158 





.138 





,200 


0.304 


0.130 


0.06 





.374 





.398 


0.424 


0.490 


0. 


314 





348 


0.372 


0.388 





.260 





332 


0.380 


0.274 


0.08 





.658 





.688 


0.698 


0.760 


0. 


.528 





554 


0.586 


0.632 





.470 





536 


0.578 


0.528 


0.1 





.866 





.878 


0.870 


0.954 


0. 


796 





810 


0.814 


0.878 





.672 





,740 


0.750 


0.760 



m n = m d = 1000 



10-dim Normal 10-dim. i-dist. (df=10) 10-dim. f-dist. (df=5) 







MI 




KL 


emp. 


Hote. 




MI 




KL 


emp. 


Hote. 




MI 




KL 


emp. 


Hote. 


-0.1 





.996 





,998 


0.998 


1.000 





.996 





996 


0.998 


0.994 





.958 





,964 


0.968 


0.990 


-0.08 





.952 





,954 


0.954 


0.986 





.902 





906 


0.906 


0.960 





.790 





816 


0.824 


0.864 


-0.06 





,694 





,698 


0.698 


0.794 





.616 





634 


0.652 


0.784 





.470 





,516 


0.550 


0.594 


-0.04 





,320 





,336 


0.336 


0.422 





.258 





278 


0.304 


0.340 





.208 





,246 


0.316 


0.232 


-0.02 





,096 





,110 


0.122 


0.132 





.080 





090 


0.102 


0.110 





.094 





,128 


0.220 


0.100 








.058 





,060 


0.064 


0.044 





.058 





068 


0.102 


0.052 





.074 





,100 


0.166 


0.068 


0.02 





.088 





,090 


0.098 


0.100 





.112 





120 


0.142 


0.114 





.092 





,128 


0.194 


0.078 


0.04 





.308 





,322 


0.324 


0.472 





.296 





318 


0.324 


0.396 





.222 





,258 


0.314 


0.248 


0.06 





.724 





.730 


0.728 


0.836 





.622 





640 


0.652 


0.752 





.474 





,500 


0.538 


0.586 


0.08 





.956 





,960 


0.958 


0.978 





.890 





900 


0.904 


0.962 





.770 





,790 


0.818 


0.856 


0.1 





.998 





,996 


0.998 


1.000 





.992 





990 


0.988 


0.998 





.966 





,970 


0.980 


0.988 
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Table 3: Averaged power functions over 300 runs are shown as functions 
of the scale parameter of the probability pd(x), where Pd{x) is defined by 
(|24p through the probability p n . Normal distribution, i-distribution with 
10 degrees of freedom, and ^distribution with 5 degrees of freedom are 
examined as p n . Below, "MI", "KL", "emp." and "Hote." denote MI- 
based test, KL-based test, empirical likelihood test and Hotelling T 2 -test, 
respectively. 

fn n = m d = 500 



10-dim Normal 10-dim. i-dist. (df=10) 10-dim. i-dist. (df=5) 



a 




MI 




KL 


emp. 


Hote. 




MI 




KL 


emp. 


Hote. 




MI 




KL 


emp. 


Hote. 


0.9 


1 


.000 





,998 


0.994 


0.042 





,986 





,978 


0.912 


0.070 





,846 





,788 


0.484 


0.044 


0.92 





.976 





976 


0.948 


0.046 





,850 





,786 


0.638 


0.058 





,592 





,492 


0.204 


0.036 


0.94 





.750 


0. 


,714 


0.554 


0.044 





,552 


0. 


,486 


0.282 


0.034 





,328 





,272 


0.110 


0.042 


0.96 





.354 





,328 


0.184 


0.054 





,240 





186 


0.096 


0.048 





,208 





,172 


0.094 


0.054 


0.98 





.112 


0. 


,096 


0.054 


0.050 





,078 


0. 


070 


0.054 


0.042 





,084 





,114 


0.142 


0.060 


1 





.064 





,078 


0.080 


0.048 





,052 





,074 


0.098 


0.042 





,066 





,086 


0.170 


0.030 


1.02 





.102 


0. 


146 


0.212 


0.044 





,104 





164 


0.248 


0.056 





,090 





,182 


0.338 


0.054 


1.04 





.334 


0. 


,406 


0.516 


0.046 





,218 





316 


0.490 


0.050 





,158 





,314 


0.514 


0.050 


1.06 





.666 


0. 


,744 


0.840 


0.050 





,516 


0. 


,670 


0.818 


0.060 





,324 





,528 


0.780 


0.054 


1.08 





.946 





976 


0.992 


0.044 





,806 





,876 


0.948 


0.038 





,538 





,716 


0.862 


0.060 


1.1 





.992 





,994 


0.998 


0.064 





,966 





,992 


0.998 


0.032 





,774 





,868 


0.974 


0.046 



m n = md = 1000 



10-dim Normal 10-dim. t-dist. (df=10) 10-dim. t-dist. (df=5) 



(T 




MI 




KL 


emp. 


Hote. 




MI 




KL 


emp. 


Hote. 




MI 




KL 


emp. 


Hote. 


0.9 


1 


,000 


1 


.000 


1.000 


0.062 


1 


.000 


1 


,000 


1.000 


0.046 





,992 





,986 


0.914 


0.056 


0.92 


1 


,000 


1 


.000 


1.000 


0.074 





.998 


0. 


,996 


0.984 


0.052 





,892 





,854 


0.638 


0.054 


0.94 





,982 





.980 


0.968 


0.054 





.912 


0. 


,892 


0.766 


0.074 





,620 





532 


0.294 


0.054 


0.96 





,648 





.608 


0.502 


0.052 





.464 


0. 


,412 


0.278 


0.040 





,264 





,214 


0.118 


0.058 


0.98 





,148 





.132 


0.104 


0.042 





.118 


0. 


,104 


0.072 


0.054 





.108 





,098 


0.080 


0.058 


1 





,046 





.050 


0.058 


0.030 





.054 


0. 


,060 


0.074 


0.040 





.066 





,088 


0.164 


0.048 


1.02 





,170 





.200 


0.256 


0.058 





.120 





,158 


0.256 


0.040 





.096 





,158 


0.310 


0.046 


1.04 





,678 





.732 


0.806 


0.060 





.416 


0. 


532 


0.650 


0.070 





.272 





,424 


0.612 


0.058 


1.06 





,978 





.984 


0.988 


0.048 





.870 





910 


0.958 


0.052 





.516 





,722 


0.856 


0.056 


1.08 


1 


,000 


1 


.000 


1.000 


0.048 





.992 





,998 


1.000 


0.054 





.850 





934 


0.970 


0.060 


1.1 


1 


,000 


1 


.000 


1.000 


0.066 





.998 


1 


,000 


1.000 


0.056 





.968 





,982 


0.996 


0.046 



28 



